CMPU4091 Visualizing Data Assignment

Improved mental health policies improves productivity

Author

C21380786 Matthew Bradon

Published

March 25, 2025

Introduction

Mental health has become an increasingly important topic in modern workplaces. Employees in tech companies often face tight deadlines that can exacerbate mental health challenges. This project explores the relationship between workplace benefits and mental health outcomes for employees in the tech sector.

The original goal of this analysis is to determine whether supportive workplace policies such as remote work, access to mental health care options, wellness programs, and flexible leave policies correlate with improved mental health outcomes. Specifically, examing whether these benefits increase the likelihood of employees seeking treatment and reduce self-reported work interference due to mental health conditions. However what was found was negiligble proof and even contradictory evidence that mental health policies had an effect on mental health related work interference.

The dataset used for is the analysis a 2014 Mental Health in Tech Survey found on kaggle (https://www.kaggle.com/datasets/osmi/mental-health-in-tech-survey). Transformations on gender and age were done, as well as an age group was added for better analysis.

  • Some of the following columns are:

    • family_history: Do you have a family history of mental illness?

    • treatment: Have you sought treatment for a mental health condition?

    • work_interfere: If you have a mental health condition, do you feel that it interferes with your work?

    • no_employees: How many employees does your company or organization have?

    • remote_work: Do you work remotely (outside of an office) at least 50% of the time?

    • tech_company: Does your employer provide mental health benefits?

    • care_options: Do you know the options for mental health care your employer provides?

    • wellness_program: Has your employer ever discussed mental health as part of an employee wellness program?

    • seek_help Does your employer provide resources to learn more about mental health issues and how to seek help?

    • anonymity: Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?

    • leave: How easy is it for you to take medical leave for a mental health condition?

    • mental_health_consequence: Do you think that discussing a mental health issue with your employer would have negative consequences?

    • phys_health_consequence: Do you think that discussing a physical health issue with your employer would have negative consequences?

    • coworkers: Would you be willing to discuss a mental health issue with your coworkers?

    • supervisor: Would you be willing to discuss a mental health issue with your direct supervisor(s)?

    • mental_health_interview: Would you bring up a mental health issue with a potential employer in an interview?

    • phys_health_interview: Would you bring up a physical health issue with a potential employer in an interview?

    • mental_vs_physical: Do you feel that your employer takes mental health as seriously as physical health?

    • obs_consequence: Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?

Data Preparation and Cleaning

Code
# Load and clean dataset
mental_health <- read_csv("survey.csv") %>%
  janitor::clean_names()
Rows: 1259 Columns: 27
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (25): Gender, Country, state, self_employed, family_history, treatment,...
dbl   (1): Age
dttm  (1): Timestamp

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
cat(strwrap(toString(unique(mental_health$gender)), width = 80), sep = "\n")
Female, M, Male, male, female, m, Male-ish, maile, Trans-female, Cis Female, F,
something kinda male?, Cis Male, Woman, f, Mal, Male (CIS), queer/she/they,
non-binary, Femake, woman, Make, Nah, All, Enby, fluid, Genderqueer, Androgyne,
Agender, cis-female/femme, Guy (-ish) ^_^, male leaning androgynous, Man, Trans
woman, msle, Neuter, Female (trans), queer, Female (cis), Mail, cis male, A
little about you, Malr, p, femail, Cis Man, ostensibly male, unsure what that
really means
Code
# Data cleaning and transformation
mental_health <- mental_health %>%
  filter(!is.na(age) & age > 15 & age < 100) %>%
  mutate(
    gender = str_to_lower(gender),
    gender = case_when(
      str_detect(gender, "female|f") ~ "Female",
      str_detect(gender, "male|m") ~ "Male",
      TRUE ~ "Other"
    ),
    treatment = ifelse(treatment == "Yes", "Treated", "Not Treated"),
    age_group = cut(
      age,
      breaks = c(seq(10, 60, by = 5), Inf),  # 10, 15, ..., 60, Inf
      labels = c("10–15", "16–20", "21–25", "26–30", "31–35", "36–40", "41–45", "46–50", "51–55", "56–60", "60+"),
      right = TRUE,
      include.lowest = TRUE
    )
  )

Gender has various different unique entries ranging from different ways of representing the two main sexes (M, m, Male, male) to Queer identities. There were some mispellings such as msle and mail. Some of the queer identities include Enby, Genderqueer, Agender, Androgyne, Trans-female. There are different spellings on the same identity i.e trans-female, Trans woman, and Female (trans). Some people also wrote Cis male or female cis. To be able to group programatically the different queer identities would be difficult and not scalable thus they will be grouped into other. Any correct spelling of male or female along with F or M will be grouped as male and female. This kind of problem could be avoided if the instead of taking a string input in the survey you used bulletpoints and if necessary an other text input.

I filtered for rows in the age range of 10 to 100. There were outliers such as 999999, -1, 5 and 8 which are invalid. The age_group column was added to explore how different age groups feel. The age groups are grouped in increments of 5 starting from 10 to 60+.

There is a states column which is states for the USA. However there are non US countires where there is no states value for them.

Data Overview

Code
datatable(mental_health)

Data Exploration

Below is the unique values for each column and the count.

Code
for (col in names(mental_health)) {
  if (col %in% c("timestamp", "comments")) {
    next
  }
  cat("Value counts for:", col, "\n")
  print(table(mental_health[[col]], useNA = "ifany"))
  cat("\n---------------------------\n\n")
}
Value counts for: age 

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 
 7  9  6 16 21 51 46 61 75 71 68 85 63 67 82 70 65 55 37 43 39 33 33 21 20 28 
44 45 46 47 48 49 50 51 53 54 55 56 57 58 60 61 62 65 72 
11 12 12  2  6  4  6  5  1  3  3  4  3  1  2  1  1  1  1 

---------------------------

Value counts for: gender 

Female   Male  Other 
   247    994     10 

---------------------------

Value counts for: country 

             Australia                Austria                Belgium 
                    21                      3                      6 
Bosnia and Herzegovina                 Brazil               Bulgaria 
                     1                      6                      4 
                Canada                  China               Colombia 
                    72                      1                      2 
            Costa Rica                Croatia         Czech Republic 
                     1                      2                      1 
               Denmark                Finland                 France 
                     2                      3                     13 
               Georgia                Germany                 Greece 
                     1                     45                      2 
               Hungary                  India                Ireland 
                     1                     10                     27 
                Israel                  Italy                  Japan 
                     5                      7                      1 
                Latvia                 Mexico                Moldova 
                     1                      3                      1 
           Netherlands            New Zealand                Nigeria 
                    27                      8                      1 
                Norway            Philippines                 Poland 
                     1                      1                      7 
              Portugal                Romania                 Russia 
                     2                      1                      3 
             Singapore               Slovenia           South Africa 
                     4                      1                      6 
                 Spain                 Sweden            Switzerland 
                     1                      7                      7 
              Thailand         United Kingdom          United States 
                     1                    184                    746 
               Uruguay 
                     1 

---------------------------

Value counts for: state 

  AL   AZ   CA   CO   CT   DC   FL   GA   IA   ID   IL   IN   KS   KY   LA   MA 
   7    7  138    9    4    4   15   12    4    1   28   27    3    5    1   20 
  MD   ME   MI   MN   MO   MS   NC   NE   NH   NJ   NM   NV   NY   OH   OK   OR 
   8    1   22   20   12    1   14    2    3    6    2    3   57   27    6   29 
  PA   RI   SC   SD   TN   TX   UT   VA   VT   WA   WI   WV   WY <NA> 
  29    1    5    3   45   44   11   14    3   70   12    1    2  513 

---------------------------

Value counts for: self_employed 

  No  Yes <NA> 
1091  142   18 

---------------------------

Value counts for: family_history 

 No Yes 
762 489 

---------------------------

Value counts for: treatment 

Not Treated     Treated 
        619         632 

---------------------------

Value counts for: work_interfere 

    Never     Often    Rarely Sometimes      <NA> 
      212       140       173       464       262 

---------------------------

Value counts for: no_employees 

           1-5        100-500         26-100       500-1000           6-25 
           158            175            288             60            289 
More than 1000 
           281 

---------------------------

Value counts for: remote_work 

 No Yes 
880 371 

---------------------------

Value counts for: tech_company 

  No  Yes 
 226 1025 

---------------------------

Value counts for: benefits 

Don't know         No        Yes 
       407        371        473 

---------------------------

Value counts for: care_options 

      No Not sure      Yes 
     499      313      439 

---------------------------

Value counts for: wellness_program 

Don't know         No        Yes 
       187        837        227 

---------------------------

Value counts for: seek_help 

Don't know         No        Yes 
       363        641        247 

---------------------------

Value counts for: anonymity 

Don't know         No        Yes 
       815         64        372 

---------------------------

Value counts for: leave 

        Don't know Somewhat difficult      Somewhat easy     Very difficult 
               561                125                265                 97 
         Very easy 
               203 

---------------------------

Value counts for: mental_health_consequence 

Maybe    No   Yes 
  476   487   288 

---------------------------

Value counts for: phys_health_consequence 

Maybe    No   Yes 
  273   920    58 

---------------------------

Value counts for: coworkers 

          No Some of them          Yes 
         258          771          222 

---------------------------

Value counts for: supervisor 

          No Some of them          Yes 
         390          349          512 

---------------------------

Value counts for: mental_health_interview 

Maybe    No   Yes 
  207  1003    41 

---------------------------

Value counts for: phys_health_interview 

Maybe    No   Yes 
  555   496   200 

---------------------------

Value counts for: mental_vs_physical 

Don't know         No        Yes 
       574        338        339 

---------------------------

Value counts for: obs_consequence 

  No  Yes 
1070  181 

---------------------------

Value counts for: age_group 

10–15 16–20 21–25 26–30 31–35 36–40 41–45 46–50 51–55 56–60   60+ 
    0    22   195   362   339   185    92    30    12    10     4 

---------------------------

Country breakdown

Code
# Count responses by country
country_counts <- mental_health %>%
  count(country, sort = TRUE)

# Calculate percentages
country_counts <- country_counts %>%
  mutate(
    pct = n / sum(n),
    pct_label = paste0(round(pct * 100, 1), "%")
  )

# Generate packed circle layout
packing <- circleProgressiveLayout(country_counts$n, sizetype = 'area')

# Combine layout with data
country_counts <- bind_cols(country_counts, packing)

# Generate circle vertices for plotting
circle_data <- circleLayoutVertices(packing, npoints = 50)

# Create a label column for text + tooltip
country_counts <- country_counts %>%
  mutate(
    label = ifelse(n > 3,
                   paste0(country, "\n", n, " Responses\n", pct_label),
                   "")
  )

# Create plot
p_bubble <- ggplot() +
  geom_polygon(data = circle_data,
               aes(x, y, group = id, fill = as.factor(id)),
               color = "white", alpha = 0.7, show.legend = FALSE) +
  geom_text(data = country_counts,
            aes(x = x, y = y, label = label),
            size = 3, color = "black", fontface = "bold", lineheight = 0.9) +
  coord_equal() +
  theme_void() +
  labs(title = "Survey Responses by Country (Packed Bubble Chart)")

# Make interactive with proper tooltip
ggplotly(p_bubble, tooltip = "label")

59.6% of the survey responses come from the USA with United Kingdom (14.7) and Canada (5.7%) being the next two largest. This introduces bias as how americans interact with their mental health in the workplace may be different to how other european countries work as well as eastern countries.

Age Distribution

Code
p1 <- ggplot(mental_health, aes(x = age)) +
  geom_histogram(binwidth = 5, fill = "steelblue", color = "black") +
  labs(title = "Age Distribution of Survey Respondents",
       x = "Age",
       y = "Count")
ggplotly(p1)

Most of the dataset is people whos age range from 20 to 40. This means the tech sector is mostly young workers in their careers.

Gender Distribution

Code
p2 <- ggplot(mental_health, aes(x = gender, fill = gender)) +
  geom_bar() +
  labs(
    title = "Gender Distribution of Survey Respondents",
    x = "Gender",
    y = "Count",
    fill = "Gender"
  ) +
  theme_minimal()
ggplotly(p2)

There is significantly more men than women in the dataset which is to be expected as tech is a male dominated field.

Correlation Heatmap with treatment

Code
# Select variables and convert to numeric
cor_data <- mental_health %>%
  select(
    work_interfere,
    age,
    treatment,
    remote_work,
    tech_company,
    care_options,
    wellness_program,
    seek_help,
    anonymity,
    leave,
    mental_health_consequence,
    phys_health_consequence,
    coworkers,
    supervisor,
    mental_vs_physical,
    obs_consequence
  ) %>%
  mutate(across(everything(), ~ as.numeric(as.factor(.))))  # convert all to numeric

# Compute correlation matrix
cor_matrix <- cor(cor_data, use = "complete.obs") %>% round(2)

# Interactive heatmap with heatmaply
heatmaply(
  cor_matrix,
  main = "Interactive Correlation Heatmap of Workplace Mental Health Factors",
  colors = colorRampPalette(c("#E46726", "white", "#6D9EC1"))(100),
  xlab = "", ylab = "",
  margins = c(60, 60, 40, 20),
  grid_color = "grey80",
  plot_method = "plotly"
)

Most of the moderate correlations have to do with company culture questions for example if your anonymity is protected in regards to mental health your company tends to support mental health related leave.

1. Treatment by Age Group

Code
p3 <- ggplot(mental_health, aes(x = age_group, fill = treatment)) +
  geom_bar(position = "fill") +
  labs(title = "Proportion of Treatment by Age Group",
       x = "Age Group",
       y = "Proportion") +
  scale_y_continuous(labels = scales::percent_format()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(p3)

This was suprising as my assumption would be the younger generations would get treated to mental health as its less stigmatized. But clearly see an increase in the proportion of those who get treated for mental health as the age group increases.

2. Treatment by Gender

Code
p4 <- ggplot(mental_health, aes(x = gender, fill = treatment)) +
  geom_bar(position = "fill") +
  labs(title = "Proportion of Treatment by Gender",
       x = "Gender",
       y = "Proportion") +
  scale_y_continuous(labels = scales::percent_format())
ggplotly(p4)

I expected the number of treatment for mental health for the genders to be around the same with a slight edge in more females. However I did not expect that 23.76% more females get treated. The Other gender is at 90% however this includes some cis males and cis females and the sample size is 10 which is not enough to draw any real conclusions

3. Treatment by Country

Code
top_countries <- mental_health %>%
  count(country, sort = TRUE) %>%
  top_n(10, n) %>%
  pull(country)

mental_health_top <- mental_health %>%
  filter(country %in% top_countries)

p5 <- ggplot(mental_health_top, aes(x = country, fill = treatment)) +
  geom_bar(position = "fill") +
  labs(title = "Proportion of Treatment by Top 10 Countries",
       x = "Country",
       y = "Proportion") +
  scale_y_continuous(labels = scales::percent_format()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(p5)

As discussed above the majority of the dataset belongs to the USA followed by the UK and Canada. They have similar ratios with the USA being 54.69% treated and 45.31% not treated. I did not expect that the majority would be treated for mental health.

4. Treatment vs Remote Work

Code
p_remote_treatment <- mental_health %>%
  filter(!is.na(remote_work), !is.na(treatment)) %>%
  ggplot(aes(x = remote_work, fill = treatment)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Mental Health Treatment vs Remote Work",
    x = "Remote Work",
    y = "Proportion",
    fill = "Sought Treatment"
  )

ggplotly(p_remote_treatment)

No signifigant difference between the rate of being treated and remote work.

5. Work Interference vs Mental Health Benefits

Code
p_benefits_interfere <- mental_health %>%
  filter(!is.na(tech_company), !is.na(work_interfere)) %>%
  ggplot(aes(x = tech_company, fill = work_interfere)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Work Interference vs Mental Health Benefits",
    x = "Company Provides Mental Health Benefits",
    y = "Proportion",
    fill = "Interference With Work"
  )

ggplotly(p_benefits_interfere)

When a company has mental health benefits, employees who report Never grows by 2.3% and Often shrinks by 3.87%, Sometimes grows by 4.78% and Rarely shrinks by 3.2%. This suggests that there may be a slight improvement in work interference when a company has good mental health benefits

6. Leave Policy vs Seeking Treatment

Code
p_leave_treatment <- mental_health %>%
  filter(!is.na(leave), !is.na(treatment)) %>%
  mutate(leave = factor(leave, levels = c(
    "Don't know",
    "Very difficult",
    "Somewhat difficult",
    "Somewhat easy",
    "Very easy"
  ))) %>%
  ggplot(aes(x = leave, fill = treatment)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Ease of Taking Leave vs Seeking Treatment",
    x = "Perceived Ease of Taking Medical Leave",
    y = "Proportion",
    fill = "Sought Treatment"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p_leave_treatment)

It is interesting to see that more people who get treated are in companies that find it hard to get mental health realted issues.

7. Company Mental Health Resources vs Willingness to Talk to Supervisor

Code
p_resources_vs_supervisor <- mental_health %>%
  filter(!is.na(care_options), !is.na(supervisor)) %>%
  ggplot(aes(x = care_options, fill = supervisor)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Awareness of Mental Health Care Options vs Willingness to Talk to Supervisor",
    x = "Aware of Mental Health Care Options",
    y = "Proportion",
    fill = "Would Talk to Supervisor"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p_resources_vs_supervisor)

Employees are more willing to talk to their supervisors about mental health if they are aware of company mental health resources. This could be due to the company making their mental health care options clear. It does also stand that if an employee knows the company does not have mental health resources they would not discusss their issues with their supervisor.

Further exploratory with breakdown by age group and gender

1. Treatment by Gender and Age Group

Code
p1 <- ggplot(mental_health, aes(x = age_group, fill = treatment)) +
  geom_bar(position = "fill") +
  facet_wrap(~ gender) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Mental Health Treatment by Age Group and Gender",
    x = "Age Group",
    y = "Proportion",
    fill = "Treatment Status"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p1)

Above we explored mental health treatement by age group and gender individually and found it generally increased with age and females got treated more. It is expected when visualizing by age group and gender we see the exact same trend. What is interesting is the majority of females getting treated, there is little growth in treatment as age increases compared to the male counterpart.

2. Work Interference by Gender and Age Group

Code
p2 <- mental_health %>%
  filter(!is.na(work_interfere), !is.na(age_group), !is.na(gender)) %>%
  ggplot(aes(x = age_group, fill = work_interfere)) +
  geom_bar(position = "fill") +
  facet_wrap(~ gender) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Perceived Work Interference by Age Group and Gender",
    x = "Age Group",
    y = "Proportion",
    fill = "Work Interference"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p2)

Besides the later age groups (which have less people seen in the age distribution) there does not seem to be a visible pattern to work interference and age group and gender

3. Leave Policy vs Treatment by Gender

Code
p3 <- mental_health %>%
  filter(!is.na(leave), !is.na(treatment), !is.na(gender)) %>%
  mutate(leave = factor(leave, levels = c(
    "Don't know",
    "Very difficult",
    "Somewhat difficult",
    "Somewhat easy",
    "Very easy"
  ))) %>%
  ggplot(aes(x = leave, fill = treatment)) +
  geom_bar(position = "fill") +
  facet_wrap(~ gender) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Ease of Taking Leave vs Treatment, by Gender",
    x = "Ease of Taking Leave",
    y = "Proportion",
    fill = "Treatment"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p3)

As mentioned above there was the trend with the number of people treated seemed to decrease with the increase ease of mental health leave. However when viewed through the lens of gender we see this true for males but not for females.

4. Remote Work and Treatment by Age Group

Code
p4 <- ggplot(mental_health, aes(x = remote_work, fill = treatment)) +
  geom_bar(position = "fill") +
  facet_wrap(~ age_group) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Remote Work vs Treatment by Age Group",
    x = "Remote Work",
    y = "Proportion",
    fill = "Treatment"
  )

ggplotly(p4)

Again remote work does not seem to have any effect on the proportion of treatment even when broken into age groups.

Big Idea

Employers who provide strong mental health workplace benefits such as remote work, supportive leave and access to mental health resources create an environment where employees are more likely to seek mental health treatment and report lower levels of work interference. This is important as work interference effects productivity. Companies can take actionable step in the form of workplace policeis that improve both human and business outcomes.

However from the exploration of the data, it has shown that supportive mental health policies had a negligible effect on workplace interference and likelihood of employee going for treatment. The below visualizations will show how these policies effect work interference.

Explanatory Visualizations

From this analysis implemnting supportive mental health policies in a workplace has negligble effect on work interference related to mental health.

Code
p1_data <- mental_health %>%
  filter(!is.na(treatment), !is.na(work_interfere)) %>%
  group_by(treatment, work_interfere) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(treatment) %>%
  mutate(total = sum(n),
         prop = n / total,
         se = sqrt(prop * (1 - prop) / total),
         ci_low = prop - 1.96 * se,
         ci_high = prop + 1.96 * se)

# Plot
p1 <- ggplot(p1_data, aes(x = treatment, y = prop, fill = work_interfere)) +
  geom_col(position = position_dodge(width = 0.9)) +
  geom_errorbar(aes(ymin = ci_low, ymax = ci_high),
                position = position_dodge(width = 0.9),
                width = 0.2) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_fill_brewer(palette = "RdYlBu", direction = -1) +
  labs(
    title = "Work Interference by Mental Health Treatment Status",
    subtitle = "With 95% confidence intervals for each group",
    x = "Treatment Status",
    y = "Proportion",
    fill = "Work Interference Frequency"
  ) +
  theme_minimal()

ggplotly(p1)

Interpretation:
Employees who are receiving mental health treatement are much more likely to have work interference due to mental health. 50.4% of not treated employees never have work interference while only 5 percent of treated employees never have work interference due to mental health. My intuition would be that people who are being treated for mental health would be less likely to have mental health work interference. However people who arent being treated for mental health may not have mental health problems and it would make sense that they would not have interference if they did not have mental health problems. Also people who are being treated for mental health do have mental health problems and it that cohourt of the sample would have more mental health related work interference. For better analysis surveying people who have mental issues


Code
p2_data <- mental_health %>%
  filter(!is.na(tech_company), !is.na(work_interfere), !is.na(gender)) %>%
  filter(gender != "Other") %>%  # Exclude "Other" gender category
  group_by(gender, tech_company, work_interfere) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(gender, tech_company) %>%
  mutate(total = sum(n),
         prop = n / total,
         se = sqrt(prop * (1 - prop) / total),
         ci_low = prop - 1.96 * se,
         ci_high = prop + 1.96 * se)

# Plot with facets
p2 <- ggplot(p2_data, aes(x = tech_company, y = prop, fill = work_interfere)) +
  geom_col(position = position_dodge(width = 0.9)) +
  geom_errorbar(aes(ymin = ci_low, ymax = ci_high),
                position = position_dodge(width = 0.9),
                width = 0.2) +
  facet_wrap(~ gender) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_fill_brewer(palette = "RdYlBu", direction = -1) +
  labs(
    title = "Impact of Company Mental Health Benefits on Work Interference",
    subtitle = "Includes 95% CI; grouped by gender and company benefit",
    x = "Company Provides Mental Health Benefits",
    y = "Proportion",
    fill = "Work Interference"
  ) +
  theme_minimal()

ggplotly(p2)

Interpretation:
This graph shows how company benefits and work interference relate to each other broken down by gender. An interesting fact to note is that uncertainty for the females is much higher than it is for males. This could be due to females make up 19.74% of the survey and with a 4 times smaller sample size introducing more uncertainty. In data exploration we saw that there was a very slight improvement in work interference when there are mental health benefits. But when account for uncertainty and doing a breakdown by gender that improvement seems to be negligble. It is interesting that for females, work interference appears to be worse with those who report Never decreasing by 5.1%. Males however appear to report less work interference with the mental health benefits. Due to the very small improvements and the large amount of uncertainty when broken down by gender I cannot conclude that work place benefits effect work interference.


Code
p3 <- mental_health %>%
  filter(!is.na(leave), !is.na(work_interfere), !is.na(age_group)) %>%
  filter(!age_group %in% c("16–20", "60+")) %>%
  mutate(leave = factor(leave, levels = c(
    "Don't know",
    "Very difficult",
    "Somewhat difficult",
    "Somewhat easy",
    "Very easy"
  ))) %>%
  ggplot(aes(x = leave, fill = work_interfere)) +
  geom_bar(position = "fill") +
  facet_wrap(~ age_group, ncol = 3) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_fill_brewer(palette = "RdYlBu", direction = -1) +
  labs(
    title = "Work Interference Across Leave Policies by Age Group",
    subtitle = "Younger workers are more affected by poor leave policies",
    x = "Ease of Taking Leave for Mental Health",
    y = "Proportion",
    fill = "Work Interference"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1),
    panel.spacing = unit(1.5, "lines")
  )

ggplotly(p3)

Interpretation:
In this visualization some of the age groups were excluded due to the very small sample size. Across each age group there is no discernable pattern amerging. The proportion of those who report never appears to grow larger as taking leave becomes easier but in some age groups so does the proportion of Often. This could just be random noise and not an actual pattern in the dataset.

Conclusion

In conclusion, I set out to see how mental health polices reduce work intereference due to mental health however found that it did not have an effect signifigant enough to definitvely say. We did a breakdown by age group and gender which gave a deeper insight such as how benefits vs interference in a single variate analysis showed a slight increase but through the lens of gender the improvement vanished.